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Abstract. We consider the effective behaviour of a rate-independent process 
when it is placed in contact with a heat bath. The method used to "thermalize" 
the process is an interior-point entropic regularization of the Moreau-Yosida 
incremental formulation of the unperturbed process. It is shown that the heat 
bath destroys the rate independence in a controlled and deterministic way, and 
that the effective dynamics are those of a non-linear gradient descent in the 
original energetic potential with respect to a different and non-trivial effective 
dissipation potential. 



1. Introduction and Outline 

In [5, 14], it was proposed that a suitable model for the effect of a heat bath (i.e. 
the application of statistically disordered energy) on a gradient descent is a time- 
incremental variational problem in which, in each time step, the usual work done 
competes with an entropy term that penalizes coherent, deterministic evolutions. In 
the case of linear kinetics (two- homogeneous dissipation) , this method is equivalent 
to the one used in [4] to generate the Fokker-Planck equation for an Ito stochastic 
gradient descent. This paper examines the case of one-homogeneous dissipation 
and generalizes the results of [14]. 

As outlined in Section 2, the discrete-time formulation of a rate-independent 
evolution in an energetic potential E(t, x) with respect to a one-homogeneous dis- 
sipation potential ^(x) is to find, given state Xi at time t%, the state Xi+i at time 
tj-l-i that minimizes 

W l+ i(xi,x i+ i) := E(t l+1 ,x l+1 ) - E(ti,Xi) + ^(x i+ i - x l ). (1.1) 

To represent the influence of a heat bath of "intensity" 9 > upon this evolution, 
we consider an associated variational problem (2.12) for the probability distribution 
of the random next state of the system, the solution of which is the Gibbsian density 

( | \ ( W i+ i(xi,x i+1 )\ 
p l+ i(x l+1 \x l ) oc cxp I — — - — \ . (1.2) 

This paper shows that, under suitable assumptions on E and "J, in the limit as 
the time step tends to zero, this procedure yields a non-trivial deterministic lim- 
iting process. This limiting process is a gradient descent in the original energetic 
potential E but with respect to a new dissipation potential ^ that is a non-linear 
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transformation (the Cramer transform) of the original one VF As demonstrated 
in [13, 14], this non-linear gradient descent arises in mechanical contexts such as 
Andrade creep. 

Rate-independent processes play an important role in the modelling of many 
physical phenomena such as plasticity and phase transformations in elastic solids, 
electromagnctism, dry friction on surfaces, and pinning problems in supercon- 
ductivity. It is widely accepted that rate- independent processes, which describe 
mesoscopic or macroscopic properties, are limit processes for more complicated mi- 
crostructural evolutions: the rate-independent model arises in the limit of vanishing 
inertia, relaxation time and thermal effects. Hence, this paper is concerned with 
the relaxation of the third of these limiting assumptions. 

In Section 2, the notation and set-up of the problem are given, including a 
brief review of the necessary elements of the theories of gradient descents and rate- 
independent processes. In Section 3, some formal calculations are performed that 
motivate the introduction of the effective dissipation potential "P. In Section 4, \I> is 
defined more formally, its properties examined, and the main convergence theorem 
(Theorem 4.4) is stated. Some conclusions and outlook for future work are given 
in Section 5. The proofs of the various results are given in Section 6. 

2. Notation and Set-Up of the Problem 

2.1. Gradient Descents. Both the unperturbed and perturbed processes of study 
in this paper are examples of gradient descents. The standard example of a gradient 
descent is the ordinary differential equation x(t) f= —WE(t, x(i)) for x : [0, T] — > R n , 
which is characterized by the energy evolution law 

±E(t,x(t)) = (dtE)(t,x(t)) - i|i(t)| 2 - l -\VE{t,x{t))\ 2 - (2-1) 

In general, gradient descents may be considered on any metric space (Q,d); see 
[1] for a comprehensive treatment. For the purposes of this paper, however, it is 
enough to consider the case in which Q is a subset of a Banach space (X,\\ ■ ||). 

A gradient descent in Q is characterized by an initial condition xo £ Q, an 
energetic potential E : [0, T] x Q — > R, and a dissipation potential X — > [0, +oo], 
which is convex and satisfies ^(0) = 0. For simplicity, E(t, x) is assumed to be 
differcntiablc with respect to both t and x. 

Definition 2.1. An absolutely continuous curve x: [0,T] — > Q is said to be a 
gradient descent in E with respect to "3/ and starting at xq if 

(1) x(0) = a:(0+) = x ; 

(2) t n- E(t,x(t)) is absolutely continuous; 

(3) the (differential) energy inequality 

±E(t, x(t)) < (8 t E)(t, x(t)) - 9(i(t)) - **(-D£?(t, x(t))) (2.2) 

is satisfied for almost every t £ [0,T], where *f>* : X* — > [0, +oo] denotes 
the convex conjugate (Legcndrc Fcnchcl transform) of ^, defined by 

V*(£) := sup{(e,x) - $(x) | x £ X}. (2.3) 

The condition (2.2) is the appropriate generalization of (2.1); the classical case of 
linear kinetics is that in which the dissipation potential is given by ^(x) := ^IMI 2 - 
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Shortly, we shall consider rate- independent processes, in which ^ is positively homo- 
geneous of degree one; the limiting processes of this paper will be gradient descents 
for which \P is not homogeneous of any degree. 

2.2. Incremental Formulation. The analysis and numerical approximation of 
gradient descents are often performed using a discrete-time incremental variational 
formulation. At each time step, the problem is to minimize the Moreau-Yosida 
regularization of E(ti, •) [10, 15]. P will denote a partition of the interval of time 
[0, T], i.e. a finite strictly increasing sequence 

P={0 = t o <U<---<t N = T}, (2.4) 

where Ati := i; — and [P] denotes the mesh size of P: 

[Pi := max lA^I. (2.5) 

i=l,...,JV 1 

The Moreau-Yosida scheme is a causal sequence of variational problems, the 
Euler-Lagrange equations of which are the equations of motion for the original 
gradient descent: 

Definition 2.2. The Moreau-Yosida incremental formulation of the gradient de- 
scent in E with respect to '5 is to solve the following sequence of minimization 

(p) 

problems: given an initial condition Xq = xq G Q, find, for i = 0, ...,N — 1, 

( p) 

6 2 to minimize 



A 



(p) 



E(t l+U x%\) -E(t u x { P) + At i+1 * y-^j ■ (2-6) 
By abuse of notation, let x^ : [0, T] — > Q also denote the cadlag piecewise-constant 

/ ( P) \ N 

interpolation of the sequence [x\ ) i=Q , as defined by 

x( p) (t) :=x\ P) forte [U,U+i). (2.7) 

2.3. Rate-Independent Processes. A rate-independent process is an evolution- 
ary system that has no intrinsic time-scale: it "reacts only as quickly as its time- 
dependent inputs" . Put another way, the- solution operator commutes with mono- 
tone reparametrizations of time. There is much literature on the theory, modelling 
and analysis of rate-independent processes and the connections with gradient de- 
scent theory; see e.g. [6, 7, 8]. 

Definition 2.3. Let Q and Q* be topological spaces. Suppose that each choice of 
initial condition xq £ Q and each input £ : [to, t{\ — > Q* determines a set of outputs 

0{[t Q ,ty],X Q ,l) C {x: [t ,ti] -> Q | X(t ) = X }. 

The input-output relationship is said to be rate-independent if, for every strictly 
increasing and surjective ip: [ip,^] — > [to,ii], 

x e O([t ,ii],a;o,^) x° V e O([to,ti],a:o,io <p). 

The relationship is said to determine a (possibly multi-valued) evolutionary system 
if concatenations and restrictions of solutions are also solutions, i.e. 

x e O([to,ti],x ,e\[t ,ti]),x S 0{[t 1 ,t 2 ],xi^\ [tut2] ) and x{t{) = x 1 

' x{t), if t e [t , 



x€ O{[t ,t-2\,XQ,i) where x(t) := 



x{t), if t e [ti,* 2 ]; 
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and 

x G 0([i o ,ii],zo,^), [s ,si] C [t , *i] and y ■= x(s ) 
X\[s ,si] S 0([s o ,si],Z/o,% o , Sl ])- 

In the case of gradient descents on (subsets of) Banach spaces as described above, 
rate-independence corresponds to the dissipation potential ^ : X — > [0, +00] being 
positively homogeneous of degree one, i.e. 

$(ai) = a*(x) for all a > 0, x G X. (2.8) 

It will be assumed that VP is both continuous and non-degenerate: i.e. there exist 
constants , > such that 

c*||x|| < < C*|M| for all x G Af. (2.9) 

This is equivalent to assuming that VP is the convex conjugate of the characteristic 
function of a suitable subset of X* : 

9{x) = x*s{x) = sup {(£, x)\i€ £} (2.10) 

for some bounded, closed and convex set £ C A"* having as an interior point. £ is 
known as the elastic region and its frontier d£ is known as the yield surface. The 
set 

S(t) := {x G Q I -D£;(t, x) G (2.11) 

is the collection of (locally) stable states at time t; since in this paper the energy 
E will always be convex, the distinction between global and local stability will not 
matter. 

As shown in [9, Theorem 7.1], the rate-independent problem is well-posed in the 
case that Q = X is a separable and reflexive Banach space; that satisfies (2.8) 
and (2.9); and that E(t, ■) is of smoothness class C 3 , with the eigenvalues of D 2 E 
bounded below by some 7 > 0, uniformly in time and space. 

2.4. Thermalized Gradient Descents: Entropic Regularization. Consider 
a gradient descent in R™ with respect to an energy E and dissipation v]>. The 
corresponding Moreau-Yosida incremental problem is as follows: given the state Xi 
at time the aim is to find Xi + \ to minimize 



W i+ i(xi,x i+ i) := E(t i+1 ,x i+ i) - E(U,Xi) + Ai i+ i* 



At 



i+l 



To model the effect of a heat bath on the gradient descent, we pass to an extended 
problem, in which the state of the system at time U is a random variable Xi . 

Given that the random state Xi assumes the value Xi at time t{, the random 
next state Xf + i for time ti+i is posited to have the conditional probability density 
function pi + i(-\xi) G L 1 (M n , A; [0,+oo]) that minimizes 

(W l+ i{x il -)pi +1 (-\xi) +Si + ip i+ i{-\xi)\ogp i+1 {-\xi)) dA, (2-12) 

where A denotes Lebesgue measure. The parameter e^+i > represents the inten- 
sity of the heat bath to which the gradient descent is coupled; more precisely, Si+i 
is the amount of (disordered) energy that the heat bath injects into the system over 
the time interval [ti,ti + i]. 
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Equivalently to (2.12), given that the current state has probability density 
function pi € L (M™,A; [0, +oo]), we may seek a joint probability density function 
Pi.i+i £ L 1 (M" x M™, A® A; [0, +oo]) that has pi as its first marginal and minimizes 



[Wi+ipij+i + Si+ipij+i logp M+ i) d(A ® A). (2.13) 



The connection between (2.12) and (2.13) is given by 

Pi,i+l(Xi,Xi +1 ) = Pi{Xi)p i+ i(Xi + i\Xi). 



The minimizer of (2.12) is a Gibbs-Boltzmann-type conditional probability den- 
sity function (c/. [3, 4]): 

/ , , cxp{^W l+1 (x l ,x l+1 )/e l+1 ) 
Pi+i{Xi+i\x l ) - j — — — — — , {t-i-V 

J ffi „ exp {-Wi + i{Xi , x l+ i )/e i+ i ) dx i+ i 

Hence, given a partition P of [0,T], an initial state xq £ R™, an energetic po- 
tential E: [0,T] x W 1 -> R and a dissipation potential M" -> [0, +oo), the 
thermalized gradient descent denotes the discrete-time Markov chain that has 
transition probability densities given by (2.14). By the usual abuse of notation, 
X( p ^ will also denote the cadlag piecewise-constant interpolation (2.7), defined for 
all times t e [0,T]. 

In the classical case of linear kinetics (i.e. ^(x) = ^\x\ 2 for x £ 1"), this pro- 
cedure generates the same sequence of densities as the method of [4], and they 
converge as [P] — > to the solution of the Fokker-Planck equation for the Ito sto- 
chastic gradient descent X(t) = —VE(t,X(t)) + y/eW(t). Theorem 4.4 establishes 
the deterministic limiting behaviour of the stochastic process X( p ) as [P] -> m 
the case of a one- homogeneous dissipation potential "J. 



3. Heuristics and Calculation of Moments 

In this section we perform some calculations to motivate the main result of Sec- 
tion 4. For simplicity, suppose temporarily that E is of the prototypical quadratic 
type 

E(t,x) = \ {Ax,x) - (i(t),x) 

for some symmetric and non- negative A: 1" — > (M™)* and some smooth enough 
I: [0,T] — ?> (M™)*; this assumption will be relaxed shortly. Also, merely to aid the 
heuristic and simplify the notation, suppose that the parameters e, > are all 
equal to some constant e > independent of i and that the time step Ati is also 
independent of i. 
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Consider the following calculation for the conditional expectation of the next 



state of the Markov chain X^ given that X 



(P) 



E 



X 



(P) 

i+l 



xP = x, 



x l+ iPi+i{x i+ i | Xi)dx i+ i 

Xi+\ exp ( - (E(ti + i,Xi+i) - E(U,Xi) + ^{x l+ i - Xi))/e) dxi+i 

exp ( - (E(t l+1} x i+1 ) - E(ti,Xi) + ^(x i+ i - Xi))/e) dx l+1 
x l+ i exp ( - (E(t l+1 ,x l+ i) + ^{x l+1 - Xi))/e) dx i+1 



exp ( - (E(t i+ x,x i+ i) + - Xi))/e) dx t+ i 

and setting z := (x i+ i — Xi)/e yields 

zcxp ( - ((Axi - i(U+i), z) + z) + *(z))) dz 



Xi + £■ 



exp ( - ((Axi - £(t i+ i), z) + \(Az, z) + &(z)j) dz 



Let 



:= log / exp ( - ((w, z) + § {Az, z) + ^(z))) dz. 
Then the result of the above calculation may be summarized as 



(3.1) 



E 



xP = Xi 



=Axi-l(t i+1 ) 



i.e. 



XP = x, 



-en**(DE(t i+1 ,Xi)). 



The same change of variables z := (xi+\ — Xi)/e gives an estimate for the p th 
moment of the increments of the Markov chain: 



E 



XP = x, 



\z\p exp ( - ((At ?; - £(U +1 ), z) + %(Az, z) + *(z))) dz 



exp ( - ((Ax, - i(t i+1 ),z) + §<As, z) + (z))) di 



For later reference, these calculations are summarized in the following lemma: 

Lemma 3.1. Let E(t,x) = \{Ax,x) - (l(t),x) with A: W -> (K™)* symmetric 
and non-negative. Suppose also that ^ = Xs : ^™ ~~ ^ [0: +oo) is 1-homogeneous and 
non-degenerate. Let X^ p ' denote the thermalized gradient descent Markov chain in 
E and ^ on a partition P of [0, T] . Then 

E \AXgl X\ P) = Xi ] = -eDfy(Axi - i(U +1 )). 
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and, for p > 0, 



E |AX (P) 



= x, 



\z\ p ex.p ( - ((Axi - l{t l+1 ),z) + f (As, z) + *(*))) dz 
cxp ( - {{Ax, - i{t i+1 ), z) + t-{Az, z) + *(z))) dz 



< e p 



The above calculations, including Lemma 3.1, also go through even if E is not a 
quadratic form. The non-^E 1 terms in the exponent are the Taylor series expansion 
of E(ti+x, Xi + i) — E{ti+\, Xi) about Xi and, therefore, the corresponding expression 
for iff* is 

*» := log cxp ^- ((w, z) + jp^(D k E(t i+1 , Xi ), z® k ) + *(z)^ ^ dz. 

(3.2) 

By abuse of notation, Lemma 3.1 will henceforth be taken to refer to the generalized 
result for not- necessarily-quadratic E using (3.2). 

Note, however, that in none of these expressions does the time increment appear 
explicitly. This is to be expected, since the original evolution was a rate- independent 
one. Therefore, in order to obtain a Markov chain that takes any account of time, 
it will be necessary to take e to be proportional to the time step. Physically, since 
E, \& and e all have the units of energy, this corresponds to assuming that the heat 
bath supplies energy to the system at a constant rate: the power of the heat bath 
is the constant of proportionality 6 between e and the time step. The parameter 
9 measures the intensity of the heat bath and can be seen, in some sense, as the 
"temperature" . 

The potential ^* : (R™)* — > [0,+oo] encodes a great deal of information about 
the Markov chain X. Most of the terms in the exponent of ^* are of order e 
or higher, and so can reasonably be expected to have no influence in the limit as 
[P] tends to zero in proportion to e. The limiting dynamics of the Markov chain 
are expected to be controlled by an effective dual dissipation potential which 
is ^* with these higher-order terms omitted. Furthermore, the strong similarity 
to the Euler method for an ordinary differential equation and the fact that the 
variances are of order e 2 -C e suggest that the limiting evolution takes the form of 
a deterministic ordinary differential equation 

y(t) = -6T>&(DE(t,y(t))), (3.3) 
where 6 = e^/Ai*. By convex duality, (3.3) is equivalent to 

D *f-4r) =VE(t,y(t)). (3.4) 



If \& is even {i.e. ^{x) = \&(— x)), then so is VP, in which case (3.4) is equivalent to 
the non-linear gradient descent 

W^r") = -BE(t,y(t)). (3-5) 



Therefore, the conjecture is that the effective behaviour of the rate-independent 
process in E with respect to ^ when brought into contact with the heat bath is 
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that of a gradient descent in E with respect to the non-linear effective dissipation 
potential iff. 

4. Main Results 

In this section the formal manipulations of the previous section are made more 
precise: the effective (dual) dissipation potential that corresponds to \I> is introduced 
and its properties examined; and the main convergence theorem about the limiting 
behaviour of the thermalized gradient descent Markov chain X^ ps> as [P] — > is 
stated. 

4.1. Effective Dissipation Potential. As mentioned above, the effective dual 
dissipation potential iff* is simply the functional iff* of (3.2) with e set equal to 
zero, and iff is its convex conjugate: 

Definition 4.1. Given iff : R n — s- [0, +oo] homogeneous of degree one, define the 
associated effective dual dissipation potential iff*: (R™)* -4 [0,+oo] by 

iff*(w) := log / exp(-({w,z) + iff(z)))dz. (4.1) 



The associated effective dissipation potential iff : R™ — > [0, +00] is the Cramer trans- 
form of iff and is defined by convex conjugation: iff := (iff*)*, i.e. 

:= sup|(w;,x) - V*(w) w € (R n )*\ . (4.2) 

Up to a minus sign, iff* is the logarithmic moment generating function (or cu- 
mulant generating function) of the Borel measure ip on R™ defined by 

dip(z) := exp(-*(z)) dz. (4.3) 

It is often convenient to write iff* as an integral over the Euclidean unit sphere 
S"" 1 C R™ with respect to (n — l)-dimcnsional Hausdorff measure W 1 ^ 1 : 

**H=log/ ^— r dTT^H- (4.4) 

Note that and ^* are objects that are intrinsic to the dissipation, not the 
energetic structure: they are determined entirely by the duality between R™ and 
(R™)* and the dissipation potential iff (or, equivalently, the geometry of the elastic 
region £). Proposition 4.2 summarizes the important properties of the effective 
dual dissipation potential iff*; the proof is deferred to Section 6. 

Proposition 4.2. If * = \*s ■ K " ~> K satisfies (2.9), then $*: (R™)* -> [0, +00] 
defined as in (4.1) satisfies 

(1) **(to) > /or a« 10 G (R™)*; 

(2) **(iu) < +00 <^=*> -io G £; 

(3) is com/ea; on (R n )*; 

(4) \ff* is smooth on 

(5) **(«;) and |D$*(iu)| +00 as -w d£. 

Proposition 4.2 immediately implies that iff is smooth and strictly convex. In 
some special cases of interest, iff* can be determined explicitly: 
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(a) **(u>) = -log(l -w 2 ). (b) * (dashed), and <I> (solid). 



Figure 4.1. The effective (dual) dissipation potential in dimen- 
sion one, with dissipation potential ^(x) := |cc | . Note the linear 
growth of '5 for large |a;| and its approximate 2-homogeneity near 
the origin. 



(1) Suppose that the elastic region £ is a rectangular box with faces perpen- 
dicular to the coordinate axes in (R n )*: 

£ :={ w = ( Wl ,...,w n )& (R")*|KI <°i for % = l,...,n}. (4.5) 

Then the dissipation potential \I> is the weighted I 1 "Manhattan" norm 
*(*)=E?=i°iM and 

n 

**o4=-E lo gK 2 -Ki 2 )- ( 4 - 6 ) 

i=l 

(2) Suppose that the elastic region £ is a Euclidean ball 

£ := {w = ( Wl ,...,w n ) G (R n )*|K| 2 + "- + K| 2 < a 2 }. (4.7) 

Then the dissipation potential ^ is exactly cr times the usual Euclidean 
norm and 

#» = -Il2_log(a 2 -M 2 ). (4-8) 

4.2. Convergence Theorem. To the standing assumption that VP = Xe satis- 
fies (2.8) and (2.9), we now add some assumptions on the energetic potential E. 
E: [0, T] x R" —> K is assumed to be bounded below, smooth in space with all 
derivatives uniformly bounded, and such that (t, x) >—¥ DE(t, x) is uniformly Lip- 
schitz. It is also assumed that E is convex, and hence that the Hessian of E is a 
non-negative operator. Two further, more technical, assumptions are also required. 
Both of these assumptions are satisfied in the prototypical case 

E(t,x) := \{Ax,x) - (i(t),x), 

where £: [0,T] -> (R™)* is Lipschitz, and A: R" -> (R™)* is symmetric and non- 
negative. In this case, the stable region at time t £ [0, T] is the preimage 

s(t) = A~\i{t)-e) 
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(a) Comparison of V (convex, dashed) (b) The potential gradient DV. 

and Vl 1 * o DV (non-convex, solid). 

Figure 4.2. An example of a strictly convex potential V for which 
o DV is not convex. In dimension 1, consider ^(x) }= \x\ and 
V(x) := fo + 50 l°gcosh(10x). In this case, *S?*(w) = — log(l-w 2 ) 
andDF(a;) = \ tanh(10x)+ j$x. The composition a: i— > %*(DV(x)) 
is evidently non-convex, although it is quasiconvcx (i.e. it has con- 
vex sublevel sets). 

and is convex and closed for every t; if A is positive-definite, then S(t) is also 
bounded, and hence compact. The prototypical case was examined in [14]; the 
technical conditions that follow were introduced in [13]. 

In order to control certain error terms in the proof of Lemma 6.4, which leads to 
Theorem 4.4, a monotonicity assumption is used to ensure that these terms have 
the right sign regardless of their magnitude. The requisite assumption is that 

for all t e [0,T], x i-» &*(DE(t, x)) is convex, (4.9) 

or, equivalently, that DW* (DE(t, ■)) is a monotone vector field for every t £ [0, T]. 
This is a non-trivial assumption even if E is strictly convex, as the example illus- 
trated in Figure 4.2 shows. Note also that (4.9) presupposes that the set S(t) of 
stable states is convex for every t <E [0,T], and that convexity of E(t,-) does not 
imply convexity of S(t) — see e.g. the kidney-shaped stable set of [9, Example 5.5]. 
Nevertheless, (4.9) holds in the prototypical case, since DE(t, x) = Ax — £(t) is an 
affine function and the composition of convex function with an affine one always 
yields a convex function [2, §3.2]. 

It is also necessary to place an implicit constraint on the time-dependency of 
E. The problem to be avoided is that all the estimates for the moments of the 
increments AX^] blow up as —DE(ti, X^') approaches the yield surface d£. The 
situation to be avoided can be expressed neatly in terms of the proposed limiting 
deterministic process and the effective dual dissipation potential. Therefore, we 
impose the following finite energy criterion: 

T an interval of time starting at 0, *| 

2/(0) such that -D£(0,j/(0))ef, > =^> sup&(DE(t,y(t))) < +oo. (4.10) 
y = -0D$*(D£(t, y)) on T J teT 
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Condition (4.10) is somewhat implicit, but appears to be unavoidable if energies 
that have neither identically zero nor constant positive-definite Hessian are to be 
considered. If E is of the prototypical quadratic form with A = 0, then (4.10) holds 
if the applied load £ is never greater than the dissipation, i.e. if 

inf inf (*(z)-£(t)-z)>0. 
te[o,T] |.z|=i 

(N.B. In this case of "small I" , the rate-independent process is static but the ther- 
malizcd process (3.4) is not.) If A is positive-definite, then (4.10) always holds 
whenever the initial condition satisfies y(0) G S(0) = A~ l (l{<$) — £). Indeed, for 
not-necessarily-quadratic energies, uniform convexity of E implies the condition 
(4.10): 

Lemma 4.3. Suppose that there exists 7_e > such that 

(D 2 E(t 7 x),v <g> v) > j E \v\ 2 for all t g [0,T) and all x,v £ R n , 

and that \\d t DE\\ L ^ < +oo. Then (4.10) holds. 

See Section 6 for the proof of Lemma 4.3. The main result is now as follows: 

Theorem 4.4. Suppose that E, satisfy the hypotheses above, including (4.9) and 
(4.10), and fix 8 > 0. Then, as [P] — > 0, the piecewise constant cadldg interpolants 
of X( p ) with Ei = 8AU converge in probability in the uniform norm to the deter- 
ministic non-linear gradient descent y: [0,T] — > R" satisfying (3.4), with the same 
initial condition Xq = y(0) = Xq £ S(0). More precisely, for any T > 0, 7] > 0, 
there exists a constant C > such that, for all small enough [P], 



sup 

te[o,T] 



\X^(t)-y(t)\> V 



< C[P} 1/2 . (4.11) 



Proof. The claim follows from the standard 0([P]) global error bound for determin- 
istic Euler schemes, the 0{e 1 / 2 ) estimate of Lemma 6.3, and the 0([P]) estimate 
of Lemma 6.4. □ 

An illustrative comparison of the original rate-independent evolution and the 
effect of the heat bath is given in Figure 4.3. Note that when 8 is large (which 
corresponds to the heat bath being very hot), y(t)/0 typically lies in the region of 
K™ close to the origin where ^ is approximately 2-homogeneous; when 8 is small 
(which corresponds to the heat bath being cold), y(t)/8 typically lies in the region 
of R™ far from the origin where ^ is approximately 1-homogeneous. Indeed, as 
8 — > 0, the original rate- independent dynamics are recovered. 



5. Conclusions and Outlook 

There arc three natural directions in which the results of this paper could be gen- 
eralized. First and most obviously, the smoothness, convexity and other structural 
assumptions on E could be relaxed: so far, the various error terms in the proof of 
Theorem 4.4 have been controlled by convexity and conditions like (4.9) and (4.10); 
in principle, so long as those error terms can be controlled (at least locally in time 
and space), Theorem 4.4 should generalize to the non-convex case. This would be a 
very interesting generalization, since the solution to the rate-independent problem 
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Figure 4.3. Comparison of the original rate- independent evolu- 
tion (z, dashed) and the effect of the heat bath (y, solid). The fron- 
tier of the stable region is shown dotted. Parameters: ^(x) = 2\x\, 
E(t,x) = 4\x\ 2 — (3sin27rf,x), initial condition 0.15. 



in a non-convex energetic potential E is not always unique; the thermalization pro- 
cedure could provide a selection principle if the thermalized process has a unique 
limit as 8 — > 0. 

Secondly, since most rate-independent processes of interest are infinite- 
dimensional, or even posed on spaces that lack a linear structure, more general 
state spaces than R" could be considered. This is a potentially subtle topic, since 
in infinite-dimensional settings there is no obvious candidate for a reference mea- 
sure with respect to which to take densities to calculate the entropy in (2.12). As 
noted in [13, Theorem 5.3.5], the Markov chains of study are not invariant under 
change of reference measure: the logarithm of the Radon-Nikodym derivative of the 
change of measure acts as an additive perturbation of the energetic potential. The 
calculations of Section 3 are quite interesting in general: they amount to a study 
of the tangent measures (in the sense of [12] & al.) of the Gibbsian distribution 
(2.14). 

Thirdly, the limiting result of Theorem 4.4 should be seen as a first-order approx- 
imation that is valid for small positive "temperatures" 9. It would be interesting 
to examine the behaviour of a suitable rescaling of X — y and determine whether it 
obeys, say, the large deviations principle with respect to a suitable rate function. 

6. Proofs and Supporting Results 

Lemma 6.1. Let : R™ — > [0, +oo) be one-homogeneous, continuous, and non- 
degenerate as in (2.8)-(2.9). Let m: (R")* — > R be given by 

m(v) := inf {(v,u) + &(u) | u G S" -1 } , 

where § n_1 C R™ denotes the Euclidean unit sphere. Then m is continuous and 

!> 0, if -v G £, 
= 0, if —v e d£ , 
<0, if-v^S. 
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Proof. To save space, write f(v,x) := (v,x) + ^(x). Since xs is convex and lower 
semi- cont inuous , 

inf f(v, x) = -**(-«) = -xs(-v). (6.1) 

x£K n 

Since f(y,x) is 1-homogeneous in x, it follows that m(v) < for —v ^ £ and that 
m(v) > for — v £ £. 

Note that / is continuous. Since m is a pointwise infimum of a family of continu- 
ous functions, it is upper semi-continuous. Since is compact, m is a pointwise 
infimum of a compactly-parametrized family of continuous functions, and so is also 
lower semi-continuous [11]. Thus, m is continuous. 

Suppose that there exists — v £ £ with m(v) = 0. By the compactness of S n_1 , 
this implies that there exists a unit vector uq £ §™ _1 with 

f(v,u ) = 0. (6.2) 

But, since — v £ £, there exists a > 1 such that —av £ £. Then (6.2) implies that 
(v,Uo) < 0, so f(av,uo) < 0, and so m(av) < 0, which contradicts (6.1). Hence, 
m[v) > for — v £ £. 

It remains only to show that m(v) ~ for — v £ d£. Suppose not, i.e. that 
there exists — v £ d£ with m(v) > 0. Since —v £ d£ and £ is convex (and hence 
star-convex with respect to the origin in (M™)*), for every a > 1, —av £" £, and so 
m(av) < 0. Hence, by the continuity of m, 

> lim m(av) = m(v) > 0, 
which is a contradiction. This completes the proof. □ 



Proof of Proposition 4-2. Let i/j denote the Borel measure on K" defined by (4.3). 

(1) By (2.9), ijj is a strictly positive and finite measure. Hence, since the expo- 
nential function in the integrand of (4.1) is never zero, the claim follows. 

(2) Consider the spherical integral form (4.4) for ^* . By Lemma 6.1, if — w £ 
£, then the integral is that of a continuous and bounded function over a 
compact set, so the integral exists and is finite. 

If — w £ d£, then, as in the proof of Lemma 6.1, there exists u w £ § rl_1 
with (w,u w ) + ^(utu) = 0, so the integrand has a pole. The triangle 
inequality for ^> implies that for u w + u £ W" 1 , 

(w, u w + u) + ^(u w + u) < (w, u w ) + (w, u) + fy(u w ) + ty(u) 

= (w, u) + 
< \u\(\w\+C 9 ). 

Hence, the integrand in (4.4) grows more quickly than \u\~ n as |u| — > 0; 
hence, by the standard result that x i— ^ \x\~ a lies in L 1 for a d-dimensional 
domain about if, and only if, a < d, it follows that ^*(w) = +oo. 

If — w £ £, then Lemma 6.1 implies that the integral in (4.1) does not 
converge, and so ^>*(w) = +oo. 
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(3) Let Z*(w) := exp$JO), and let v,w &-£ and < s < 1. Then 
Z£(sv + (1 - s)w) 

exp ( — (sv + (1 — s)w, z)) dijj(z) 



< 



exp ( — s(v, z)) exp ( — (1 — s)(w, z)j dip(z) 

(^J exp(- (v,z})di/j(z)J (^J exp ( - (w, z)) dil>(z)j 
Z*{v*) s Z*{w) l - s , 



where the inequality follows from Holder's inequality. Hence, since the 
logarithm is a monotonically increasing function, for all v, w G —£, 

^*(sv + (1 - s)w) < sV*(v*) + (1 - s)$*(w). 

Moreover, since — £ is convex and M/* is identically +oo outside the interior 
of -£ , is convex on all of (R™)*. 
(4) The derivative DZ£ : - £ -> (R n )** = R n can be computed using the 
standard theorem on differentiation under the integral sign, yielding 

DZ*(v)= [ -zexp(- (v,z))d^(z), 

JR™ 

and so on for higher-order derivatives: 

D fc Z » = / (-zf k cxp(-(v,z))diP(z). 



The integrals involved are all finite for — v € £ because of the exponentially 
small tails of the measure ip. 
(5) As in the proof of the second part of the claim, let — w £ £ and let u w £ 
be such that (w,u w ) + f is minimal (i.e. equals m(w)). Then 

(w, u w + u) + $>(u w + u) < m(w) + \u\ (\w\ + Cy) . 

Since m(w) — > as — w — > d£, the same argument as in point (2) applies, 
and so ^>*(w) — s- +oo as — w — > d£. Now suppose that DvP* does not blow 
up. Then, since is smooth and £ is compact, VP* would be bounded on 
—£. which is a contradiction. □ 

Proof of Lemma 4-3. The energy evolution equation for VP* along y° can be calcu- 
lated using the chain rule, yielding 

±&{DE(t,y°(t))) 

y£(!, !/ (()),Dr(D£(f, !/ »(t)))® ! 

t,y (t)),V**(DE(t,y°(t] 



m*(T>E{t,y Q {t)) +\\d t DE\\ L ~ D**(DE{t,y°(t))) 



Proposition 4.2(5) implies that if \&* blows up along any curve (i.e. one that ap- 
proaches —d£ in the dual space), then so does |D\&*|. However, the mean value 



THERM ALIZATION OF RATE-INDEPENDENT PROCESSES 



15 



theorem and the above calculation imply that \&* must be decreasing when |D\&*| 
is large. This yields the desired contradiction. □ 

The next lemma (Lemma 6.2) concerns the closeness of the effective dual dissipa- 
tion potential VP* and the corresponding quantity that controls the increments 
of the Markov chain. Lemma 6.3 gives the resulting bound for the classical gradient 
descents in vp* o DE and o HE. Both these two results apply to the prototypical 
case of a quadratic energetic potential. 

Lemma 6.2. Suppose that the energetic potential E is smooth enough that 
M := sup sup \\H k E(t, -)ll < +oo, 



sup sup \\D*E(t,-) L < 
te[o,T] k>2 p 



where 



denotes the operator norm. Then, for every K <s —£ and every 



k e MU{0}, D k V* -> D k V* uniformly on K as e — > 0. More precisely, for every 
such K and k, there exists a constant C > such that 

sup \D h tyt{w) - D k ^*(w)\ < Ce 1/2 for all small enough e > 0. 

w<£K 

Proof. The essential quantity to estimate is 

oo 

x - e 



1 



exp 



1=2 



a 



since, by the elementary inequality 



< 



(&E(t,x),z^) 



\c\\b-d\ 



| z |* e -«»,»>+*W) d8) 



\d\ 



\bd\ 



it holds true that 

|D fe **(w) -D fe $*H| < 



|D fc Z *H| 
-Ifc.eH + 7t , iW , , J , e (u;). 



(6.3) 



2*h *^ ' z*( w )z E ( w y 

Let m(w) := inf{(u;,z) + ^(z) | |z| = 1}. By Lemma 6.1, m is continuous and 
bounded away from on K. Similarly, since Zq and Z E are continuous and positive, 
they are bounded away from on K, and |D fe Zg | is bounded on K. (Note that all 
these bounds fail on — d£, so the assumption that K <s —£ is essential.) Thus, the 
emphasis is on estimating Ik te (w) in terms of e and uniformly over K. 

Ik,e{w) will be estimated by splitting the integral into two parts: an integral 
over a ball around the origin in W 1 , and an integral over the complement. More 
precisely, for any a £ (0, 1), let R = R(a, e, x, t) > be such that 



\z\<R=> l-exp[-J2^-(D e E(t,x),z 



1=2 



< a. 



(6.4) 



Converting to spherical polar coordinates yields that, for some constant c„ depend- 
ing only on n, 



h,e{w) < c n a 



r k+n-X e - 



+ oo 



r k+n-X e - 



This estimate is valid for any a £ (0, 1) and corresponding R. The above integrals 
can be evaluated exactly using the recurrence relation 



„n— 1 cx 



dx: 
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the resulting polynomial-exponential expressions are a bit cumbersome to deal with, 
but only the leading-order contributions as e — > are of interest here. 

We now make a specific choice of a and R such that a — > and R — > oo at the 
right relative rates. Let a := e 1 / 2 and let 

R~-log(-±log(l-e 1 / 2 )). 



For any z G 



/ OO £_i \ OO 



k\ ' e ^ i\ x v ' n ' 

=2 / 1=2 

£\ II ^ ' ' Hop 1 



1 — £ 

<7Eill D ^-)iLM £ 



£=2 
M / , ,x 

< — cxp(e|z|), 

and if |z| < R, then 

jE|r( D ^^^)<-io g (i- £ 1/2 ), 

£=2 

as required for (6.4) to hold. 

By l'Hopital's rule, for this choice of a and R, a — > and R — > +oo as e — > 0, 
and there exist constants ci, C2 such that 

h,H < ae 1/2 -4t J + C2 ( -TT J e ~ m[W)R - 

\m(w) J \m(w) J 

The dominant term here is the e 1 / 2 term, since 

R k+n-l e -m(w)R nQt Qnly tcnds tQ 

0, but does so with all derivatives tending to zero as well; m(w) is bounded away 
from zero for w G K <e — £ . Thus, there is a constant C'k (dependent on k and the 
other geometric parameters, but not on e) such that 

sup Ik. e {w) < CkS 1 ^ 2 for all small enough e > 0. 

weK 

Thus, by (6.3), as claimed 

sup |D fc $*(w) - D fe $*H| < C' k e 1/2 + C' e 1/2 e 0(e 1/2 ) as e -»■ 0. □ 

Lemma 6.3. Suppose that y £ , e > 0, and y° solve 

if = -T>& e (VE(t,y E )), 
y° = -D^(DE(t,y )), 

with initial conditions y £ (0) = y°{0) = xq such that — DE(0,xq) £ £, and that 
(4.10) holds. Then there exists a constant C > such that 

sup |j/ £ (i) — j/° (£) | < Ce 1 ^ 2 for all small enough e > 0. 

*6[0,T] 

Proof. The strategy is to appeal to Lemma 6.2 and Gronwall's inequality. First, 
note that there exists a K <e £ such that — DE(t, y°(t)) € K for all t > 0, i.e. 

inf dist( - D£(t, y°(t)),d£) > 0, 
te[o,T] 
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for otherwise, since ^* blows up to +00 on —d£ (by Proposition 4.2(5)), it would 
follow that t 1 — ^ m*(DE(t,y°{t))) blows up to +00 in finite time, which would 
contradict (4.10). 

Assume that e > is small enough that the conclusion of Lemma 6.2 holds. 
Then, by Lemma 6.2, there exists C > such that 

sup |D**(i,y°(i)) -D$*(f,y°(f))| < C*e 1/2 . 

S£[0,T] 

Let L be the product of the (finite) Lipschitz constants for DE and D^*!^ . Then, 
by Gronwall's inequality, for all t £ [0,T], 

\f{t)~y\t)\<^\ □ 

Lemma 6.4. Consider a uniform partition P of [0, T] with [P] = h > 0. Let X 
be the Markov chain generated by E (convex) and ^ (as usual) with e = h, and 
assume that (4.9) holds. Let (yi)fL 1 be the Euler approximation to 

y = -Vn(VE(t,y(t))) 

given by 

Ay, := -hDVUDE^^Sff), 
with Xq = yo such that — DE(0,yo) G £■ Then, for every 77 > 0, 



max \Xi —yA > i] 

0<i<T/h ' 1 



G Q(h) as h -> 0. (6.5) 



Proof. In order to simplify the notation, assume that the partition P is a uni- 
form partition with [P] = h > 0, and define a time-dependent vector field fh by 
/fc(t,a;) := -B^* h (BE(t,x)). Fix 5 > small enough that 

K(t) := {x e S(t) I dist(x, 85(f)) > 5} 

is non-empty and contains y(f) for every t € [0, T]. Furthermore, using the Lipschitz 
assumption on DE, assume that 5 is small enough that K(ti) <g 5(fj±i) for each i. 
Write (dropping the superscript that indicates the partition P or its mesh size h) 

Xi+i = Xi + hfh(ti+i, Xi) + Sj + i(JQ), 
Vi+i = Vi + hfh(t i+1 ,yi). 

By Lemma 3.1 (or, more precisely, its generalization to non-quadratic E through 
(3.2)), for each x, 'B.i+i (x) is a random variable with mean and k th central moment 
at most Ck{x)h k . The deviations Z := X — y satisfy 

Z l+ i = Zi + h{f h (t l+ i,Xi) - fh{U+i,Vi)) + 3 i+ i(Xi). (6.6) 

In summary, we have the following facts: 

(M) fh(ti+i, ■) is a monotonically decreasing vector field on 5(fj + i); 
(B) fh(ti+i, ■) is bounded on compactly-embedded subsets of 5(f^ + i); 
(Z) for every x, E[E i+ i(x)] = 0. 

Let JCi be (the tr-algebra generated by) the event that Xj £ K{tj+\) for < j < i. 
Applying the conditional expectation operator E[-|/Cj] (which is never conditioning 



18 



T. J. SULLIVAN, M. KOSLOWSKL F. THEIL, AND M. ORTIZ 



on an event of zero probability) to the Euclidean dot product of (6.6) with itself 
yields that 

E[|Z i+1 | 2 |/C,] - E[|Zi| 2 |/Ci] 

= 2hE[(f h (t l+1 ,X l ) - f h (t i+1 , yi )) ■ ZilfCi] < by (M) 

+ 2E[Zi-Ei+i(Xi)\JCi] =0by(Z) 
+ 2hE[(f h (t i+1 ,Xi) - f h {U+x,Vi)) ■ Et +1 {Xi)\JCi] = by (Z) 
+ /i 2 E[|/ h (t^i,X i )-/ h (t i+ i,y i )| 2 |^] <Ch 2 by (B) 

+ E[\E l+1 (X i )\ 2 \K. l ] < Ch 2 by Lemma 3.1 

< Ch 2 , 

and application of the unconditional expectation operator to both sides yields the 
following uniform bound for the second moment of the deviations: 

max E\\Zi\ 2 ] < CTh. (6.7) 
0<i<T/h L J ~~ 

Inequality (6.7) can be used to "bootstrap" a similar inequality for the fourth 
moments. Define a tetralinear form r: (R n ) 4 — >• R by 

T(w,x,y,z) := (w x)(y ■ z), (6.8) 

so that | cc | 4 = t(x,x,x,x). This tetralinear form is invariant under arbitrary com- 
positions of the following interchanges of entries: (1,2), (3,4) and (1,3)(2,4). The 
Cauchy-Bunyakovskh-Schwarz inequality for the Euclidean inner product implies 
a corresponding inequality for this tetralinear form: for all w, x,y, z £ R™, 

\T(w,x,y,z)\<\w\\x\\y\\z\. (6.9) 

Hence, E[|Z i+ i| 4 ] = E[E[|Z i+ i| 4 |/Ci]] can be expanded using the tetralinear 
form (6.8) and (6.6) and each term estimated as in the derivation of (6.7). By (Z), 
those terms containing precisely one Sj_|_i(Xj) have zero expectation; the terms of 
the form 

E [r(^C, Zi,Z h hifhiU+i , Xi) - fh{U+i,yi)))\ ICi] 

are non-positive by (M); the remaining terms can all be estimated using (B), (6.7), 
(6.9) and Lemma 3.1, with the worst bound being 0(h 3 ). Thus, the following 
uniform bound for the fourth moment of the deviations holds: 

max E\\Zi\ 4 ] < CTh 2 . (6.10) 

o<i<T/h Ll 1 J ~ 

Hence, for r\ > 0, 

P[\Zi\ > n for some < i < T/h] 

T/h 

i=0 
T/h 

< ?7 _4 E[|Zi| 4 ] by Chebyshev's inequality 

i=0 

< rj- 4 CT 2 h by (6.10), 

which establishes (6.5) and completes the proof. □ 
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